A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries
نویسندگان
چکیده
In the statistical literature, there has been considerable development of methods of data releases for multivariate categorical data sets, where the releases come in the form of marginal and conditional tables corresponding to subsets of the categorical variables. In this chapter we provide an overview of this methodology and we relate it to the literature on the release of association rules which can be viewed as conditional tables. We illustrate this with two examples. A related problem, ”association rule hiding” is often independently studied in the database
منابع مشابه
Statistical Disclosure Limitation with Released Marginals and Conditionals for Contingency Tables
The goal of statistical disclosure limitation is to develop methods and tools that while preserving confidentiality can provide access to useful statistical data, not just a few numbers. In this paper we consider releases from contingency tables in the form of marginal counts and observed conditional frequencies. We link data utility to log-linear models, and evaluation of disclosure risk to bo...
متن کاملBounds for Cell Entries in Two-Way Tables Given Conditional Relative Frequencies
In recent work on statistical methods for confidentiality and disclosure limitation, Dobra and Fienberg (2000, 2003) and Dobra (2002) have generalized Bonferroni-Fréchet-Hoeffding bounds for cell entries in k-way contingency tables given marginal totals. In this paper, we consider extensions of their approach focused on upper and lower bounds for cell entries given arbitrary sets of marginals a...
متن کاملPartial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts
Tabular data have been a staple product for disseminating information derived from the confidential microdata that fuel social science research and inform policy decisions. This paper outlines recent results on disclosure risk assessment associated with the release of high-dimensional contingency tables, and discusses some related research problems. The main focus is the partial information rel...
متن کاملPreserving confidentiality of high-dimensional tabulated data: Statistical and computational issues
Dissemination of information derived from large contingency tables formed from confidential data is a major responsibility of statistical agencies. In this paper we present solutions to several computational and algorithmic problems that arise in the dissemination of cross-tabulations (marginal sub-tables) from a single underlying table. These include data structures that exploit sparsity to su...
متن کاملWeb Systems That Disseminate Information But Protect Confidential Data
Statistical agencies have longstanding concern over confidentiality of their data [14, 15]. But agencies must also report information to the public. This tension between confidentiality and dissemination of statistical information is heightened by the emergence of the World Wide Web as a means of communication. On the one hand, confidentiality is threatened by advances in information technology...
متن کامل